ReadMe File for "Did You Hear About Clarence Thomas?" by Christopher N. Krewson, Jessica A. Schoenherr, and Marcy Shieh

RP_ReplicationFile.R contains the code needed to replicate the various figures and tables included in our manuscript. The code calls upon various datasets (.RData files).

We ran our code on R version 4.3.3.

Figure 1 (see code lines 1-6) shows the results of our sharp regression discontinuity analysis, with time as the running variable. The cutpoint of 0 represents the day of the publication. The analysis is testing the causal effect of the publication by treating the date of the publication as an exogenous shock. The dependent variable ("Total") is our measure of information seeking. The bandwidth (the number of observations on each side of the cutpoint used to determine the causal effect of the running variable) is determined by the Imbens-Kalyanaraman method.

Figure 2 (see code lines 8-19) shows the cumulative percentage of our sample showing interest in the Court for each day since the publication, as well as the change in the cumulative perception each day after the publication. dat$prop is the cumulative percentage of the population that had shown interest in the Court for each day (with each day identified in the vector dat$rdd).

The regression results in Appendix A (see code lines 21-22) are obtained simply by using the summary command on the object containing the estimates from our regression discontinuity analysis (stored in the object rdd_simple).

Appendix B contains a table of regression results (see code lines 24-41). Each regression analysis is based on different sets of data. Be sure to run the code in order, so that each regression is being ran on the correct subset of data. The regressions are all logistic regression models of measures of information seeking on various respondent characteristics. Stargazer creates latex code output that matches the content and ordering of the regression results in the appendix.

Appendix C includes the output for our sharp regression discontinuity analysis (see codes lines 43-49) using the Google Trends data, with time as the running variable. The cutpoint of 0 represents the day of the publication. The analysis is testing the causal effect of the publication by treating the date of the publication as an exogenous shock. The dependent variable ("count") is our measure of information seeking. The bandwidth is determined by the Imbens-Kalyanaraman method. 

Appendix F includes the demographic information of our respondents. The code used to generate this table is on lines 51-62.

Appendix H includes trends in web engagement for our entire sample (see codes lines 64-68). The webdata.RData file is a large file containing all internet engagements recorded by YouGov for our sample. Loading the file may take a few more seconds than it took for loading the other files. The commands create barplots showing the frequency of web engagements using different aggregations of time.

Finally, codes lines 70-126 plot predicted levels of information seeking (and associated confidence intervals) using the logistic regression models estimate earlier in the code. These results are included in Appendix I of our manuscript.
